2 results
9 - Let the Data Do the Talking: Hypothesis Discovery from Large-Scale Data Sets in Real Time
-
- By Christopher Oehmen, Pacific Northwest National Laboratory, Scott Dowson, Pacific Northwest National Laboratory, Wes Hatley, Future Point Systems, Justin Almquist, Pacific Northwest National Laboratory, Bobbie-Jo Webb-Robertson, Pacific Northwest National Laboratory, Jason McDermott, Pacific Northwest National Laboratory, Ian Gorton, Pacific Northwest National Laboratory, Lee Ann McCue, Pacific Northwest National Laboratory
- Edited by Ian Gorton, Deborah K. Gracio
-
- Book:
- Data-Intensive Computing
- Published online:
- 05 December 2012
- Print publication:
- 29 October 2012, pp 235-257
-
- Chapter
- Export citation
-
Summary
Discovering Biological Mechanisms through Exploration
The availability of massive amounts of data in biological sciences is forcing us to rethink the role of hypothesis-driven investigation in modern research. Soon thousands, if not millions, of whole-genome DNA and protein sequence data setswill be available thanks to continued improvements in high-throughput sequencing and analysis technologies. At the same time, high-throughput experimental platforms for gene expression, protein and protein fragment measurements, and others are driving experimental data sets to extreme scales. As a result, biological sciences are undergoing a paradigm shift from hypothesisdriven to data-driven scientific exploration. In hypothesis-driven research, one begins with observations, formulates a hypothesis, then tests that hypothesis in controlled experiments. In a data-rich environment, however, one often begins with only a cursory hypothesis (such as some class of molecular components is related to a cellular process) that may require evaluating hundreds or thousands of specific hypotheses rapidly. This large number of experiments is generally intractable to perform in physical experiments. However, often data can be brought to bear to rapidly evaluate and refine these candidate hypotheses into a small number of testable ones. Also, often the amount of data required to discover and refine a hypothesis in this way overwhelms conventional analysis software and hardware. Ideally advanced hardware can help the situation, but conventional batch-mode access models for high-performance computing are not amenable to real-time analysis in larger workflows. We present a model for real-time data-intensive hypothesis discovery process that unites parallel software applications, high-performance hardware, and visual representation of the output.
Contributors
-
- By Leonard A. Adler, Henrik Anckarsäter, L. Eugene Arnold, Philip J. Asherson, Russell Barkley, Joseph Biederman, Andrew D. Blackwell, Jessica Bramham, Thomas E. Brown, Richard Bruggeman, Jan K. Buitelaar, C. Keith Conners, Jonathan H. Dowson, Steve V. Faraone, Christopher Gibbins, Christopher Gillberg, I. Carina Gillberg, Ylva Ginsberg, Laurence L. Greenhill, Julia D. Hunter, Cornelis C. Kan, Ronald C. Kessler, Scott H. Kollins, J. J. Sandra Kooij, Johanna Krause, Jonna Kuntsi, Florence Levy, Stephen P. McDermott, Gráinne McLoughlin, Mitul A. Mehta, Asko Niemela, Eleni Paliokosta, Yannis Paloyelis, Vangelis Pappas, Patricia Quinn, Maria Råstam, Doris Ryffel, David Shaw, Seija Sirviö, Thomas Spencer, Lacramioara Spetie, Siegfried Tuinier, Fiona E. van Dijk, Anne M. D. N. van Lammeren, Wim J. C. Verbeeck, Margaret Weiss, Timothy E. Wilens, Kiriakos Xenitidis
- Edited by Jan K. Buitelaar, Cornelis C. Kan, Philip Asherson, Institute of Psychiatry, London
-
- Book:
- ADHD in Adults
- Published online:
- 04 April 2011
- Print publication:
- 03 March 2011, pp vii-ix
-
- Chapter
- Export citation